Social mimicry and small networks

George G. Vega Yon
(with Prof. Kayla de la Haye and Brooke Bell)
ggvy.cl

Center for Applied Network Analysis (CANA)
Department of Preventive Medicine

September 13, 2018

Today’s Talk

  1. Social Mimicry

  2. Small network (statistics)

But first, a detour!

Ranting about R vs SAS

We will look at the following myths:

  1. “SAS is easier than R”

  2. “SAS is required for drug tests by the FDA”

  3. “R is cool… but it can’t handle data out-of-memory like SAS”

  4. “SAS has a higher demand than R in the job market”

Ranting about R vs SAS

Myth: “SAS is easier than R”

Reallity: Take a look at this simple task of importing a CSV file with a header

dataset <- read.csv("mydata.csv")
PROC IMPORT DATAFILE = "mydata.csv" GETNAMES = yes OUT = dataset REPLACE;
  getmames = yes;
run;

You be the judge…

Ranting about R vs SAS

Myth: “SAS is required for drug tests by the FDA”

Ranting about R vs SAS

Myth: “SAS is required for drug tests by the FDA”

Reallity:

FDA does not require use of any specific software for statistical analyses, and statistical software is not explicitly discussed in Title 21 of the Code of Federal Regulations [e.g., in 21CFR part 11]. However, the software package(s) used for statistical analyses should be fully documented in the submission, including version and build identification. — FDA, May 6, 2015

Ranting about R vs SAS

Myth: “R is cool… but it can’t handle data out-of-memory like SAS”

Ranting about R vs SAS

Myth: “R is cool… but it can’t handle data out-of-memory like SAS”

Reallity: Just take a look at the CRAN Task View for High Performance and Parallel Computing in R

biglm, ff, bigmemory, HadoopStreaming, speedglm, biglars, MonetDB.R, ffbase, LaF, bigstatsr

Ranting about R vs SAS

Myth: “SAS has a higher demand than R in the job market”

Source: KDnuggets

Source: Stackoverflow

As pointed out by David Smith on “Job trends for R and Python”

Source: indeed.com’s Job trends

Source: indeed.com’s Job trends

Ranting about R vs SAS

  1. “SAS is ugliereasier than R”

  2. “SAS is NOT required for drug tests by the FDA”

  3. “R is cool … but it can’t AND CAN handle data out-of-memory like SAS”

  4. “SAS has a higher LOWER demand than R in the job market”

Social Mimicry

Context:

  • We observe families during a meal.

  • For each family member, we timestamp the moment at which she/he took a bite (took the fork/spoon filled with food to her/his mouth).

  • Theory predicts the emergence of classes of automatic behavior as a response to others’ bites (cues)

Two questions:

Is there any synchrony (bite rate) amongst family members?

Is there any mimicry amonst family members?

Simulated dyad

Simulated dyad

So we have, \(T_1 = \{t_1^1, t_1^2, t_1^3, t_1^4, t_1^5, t_1^6\}\), and \(T_2 = \{t_2^1, t_2^2, t_2^3\}\)

\[ R(t_2^2, T_1) = t_1^3 \]

Mathematically, we can describe the data as follows:

  • For each individual \(i\) we observe a vector \(T_i \equiv\{t^i_1, t^i_2,\dots\}\) with \(i\)’s bites timestamps. You can think of this as a Poisson process.

  • Let \(n_i\) denote the size of \(T_i\).

  • Also, define the function \(R:T_i\times T_j\mapsto T_j\) as that which returns the leftmost close bite of \(j\) to \(i\), i.e. the inmediate bite of \(j\) before \(i\) took a particular bite:

\[ R(t_i^n, T_j) = \left\{\begin{array}{l} \mbox{Undefined},\quad\mbox{if }(\forall t_j^n\in T_j) \exists t_i^m\in T_i \mbox{ s.th. }t_i^m\in(t_j^n,t_i^n) \\ \arg\max_{\{t_j^n:t_j^n\in T_j, t_j^n \leq t_i^n\}}t_i^n - t_j^n,\quad\mbox{otherwise.} \end{array}\right. \]

For now we will focus on the cases where this is defined.

  • A possible statistic to test this is to take the average time gap between \(i\) and \(j\)’s bite, formally, assuming \(R(t, T_j)\) is defined for all \(t\in T_i\), we have

    \[ S_{ij} = \frac{1}{n_i}\sum_{t\in T_i}(t - R(t, T_j)) \]

  • Given the data structure, we can use permutations to build a null distribution.

Permuting time intervals

  • Imagine we observe the followin: \(T_{a} = \{0, 1, 3, 6\}\), then we have \(3! = 6\) total permutations as what is swapped are time intervals.
Distribution of permuted set (50,000 permutations).

Distribution of permuted set (50,000 permutations).

Questions for Part 1

  • Going back to the case of Undefined left most close time, how can we use that information in our model? (example with 1,000 vs 2)

  • In some cases, subjects take relatively long pauses during the meal. What should we do with the solo time of the other dyad?

  • Another approach could be using a EM algorithm (the missing data is the failed mimic bite), and look at the p-values of the missing data parameter (this implies modeling the data as a Poisson process with a probit component [to bite or not to bite]).

Part 2: Small Network

Context

  • We observe groups of individuals, teams performing different tasks.

  • Each group is composed of between 3 to 5 members.

  • The core question

How does your perception of the social network (your cognitive social structure) in which you are embedded impacts your and your network’s performance?

Cognitive Social Structures

Alice, Bob, and Charlie are all friends. Thus, there are three separate representations of their network. If they each believe they are friends with the other two, but that the other two are not friends, then all three representations are distinct. – From wikipedia

  • Our current approach: We defined an statistic that measures some sort of correlation:

    \[ S_T\equiv \frac{1}{n(n-1)}\sum_{\{(i, j):(i,j)\in T, i<j\}}H(i,j) \]

    Where \(H(i,j)\) is the hamming distance between \(i\) and \(j\)’s perceived social structures.

  • \(S_T\in[0,1]\), where 1 means perfect correlation (all subjects have the exact same perception of the graph.)

  • I call this cognitive dissonance (is this OK prof. de la Haye?).

A simple example

We will perform the following simulation experiment

  1. Draw a random graph of size \(n\)

  2. Generate \(n\) replicates of such graph, and rewire the endpoints with probability \(p\)

  3. Given the set of \(n\) graphs, compute \(S_T\)

Distribution of $S_T$

Distribution of \(S_T\)

A nïve way to correct this, is using an orthogonal projection matrix on \(n\), in other words, remove all that is correlated:

\[ S_T' = \mathbf{M}_n S_T \]

Where \(\mathbf{M}_n = (I_n - n(n^\mathbf{t}n)n^\mathbf{t})\)

Distribution of $S_T$ using the orthogonal projection onto $n$.

Distribution of \(S_T\) using the orthogonal projection onto \(n\).

Questions

  • How can we correct the small sample bias?

  • Since networks are small, perhaps we could approximate via simulation (brute-force). Opinons?

  • An alternative, use exact distributions. We could in principle assume that dyads are independent, and take each observed 0/1 distributed bernoulli with parameter \(p_i\), where \(i\) index individuals. We would still need to make assumptions to compute \(p_i\).